Search CORE

27 research outputs found

Network stack specialization for performance

Author: Handley M
Marinos I
Watson RNM
Publication venue: Association for Computer Machinery (ACM)
Publication date: 20/12/2013
Field of study

Contemporary network stacks are masterpieces of generality, supporting a range of edge-node and middle-node functions. This generality comes at significant performance cost: current APIs, memory models, and implementations drastically limit the effectiveness of increasingly powerful hardware. Generality has historically been required to allow individual systems to perform many functions. However, as providers have scaled up services to support hundreds of millions of users, they have transitioned toward many thousands (or even millions) of dedicated servers performing narrow ranges of functions. We argue that the overhead of generality is now a key obstacle to effective scaling, making specialization not only viable, but necessary. This paper presents Sandstorm, a clean-slate userspace network stack that exploits knowledge of web server semantics, improving throughput over current off-the-shelf designs while retaining use of conventional operating-system and programming frameworks. Based on Netmap, our novel approach merges application and network-stack memory models, aggressively amortizes stack-internal TCP costs based on application-layer knowledge, tightly couples with the NIC event model, and exploits low-latency hardware access. We compare our approach to the FreeBSD and Linux network stacks with nginx as the web server, demonstrating ∼3.5x throughput improvement, while experiencing low CPU utilization, linear scaling on multicore systems, and saturating current NIC hardware

UCL Discovery

Recommended from our members

Balancing Disruption and Deployability in the CHERI Instruction-Set Architecture (ISA)

Author: Moore SW
Neumann PG
Watson RNM
Publication venue: New Solutions for Cybersecurity
Publication date: 01/01/2017
Field of study

For over two-and-a-half decades, dating to the first widespread commercial deployment of the Internet, commodity processor architectures have failed to provide robust and secure foundations for communication and commerce. This is in large part due to the omission of architectural features allowing efficient implementation of the Principle of Least Privilege, which dictates that software runs only with the rights it requires to operate [19, 20]. Without this support, the impact of inevitable vulnerabilities is multiplied as successful attackers gain easy access to unnecessary rights – and often, all rights – in software systems

Apollo (Cambridge)

CHERI Macaroons: Efficient, host-based access control for cyber-physical systems

Author: Beresford AR
Clarke J
Dodson M
Richardson A
Watson RNM
Publication venue: Proceedings - 5th IEEE European Symposium on Security and Privacy Workshops, Euro S and PW 2020
Publication date: 01/01/2020
Field of study

Cyber-Physical Systems (CPS) often rely on network boundary defence as a primary means of access control; therefore, the compromise of one device threatens the security of all devices within the boundary. Resource and real-time constraints, tight hardware/software coupling, and decades-long service lifetimes complicate efforts for more robust, host-based access control mechanisms. Distributed capability systems provide opportunities for restoring access control to resource-owning devices; however, such a protection model requires a capability-based architecture for CPS devices as well as task compartmentalisation to be effective. This paper demonstrates hardware enforcement of network bearer tokens using an efficient translation between CHERI (Capability Hardware Enhanced RISC Instructions) architectural capabilities and Macaroon network tokens. While this method appears to generalise to any network-based access control problem, we specifically consider CPS, as our method is well-suited for controlling resources in the physical domain. We demonstrate the method in a distributed robotics application and in a hierarchical industrial control application, and discuss our plans to evaluate and extend the method.Gates Cambridge Scholarshi

Crossref

Apollo (Cambridge)

Structural analysis of whole-system provenance graphs

Author: Balakrishnan ND
Bytheway T
Carata L
Sohan R
Soman J
Watson RNM
Publication venue: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publication date: 01/01/2018
Field of study

System based provenance generates traces captured from various systems, a representation method for inferring these traces is a graph. These graphs are not well understood, and current work focuses on their extraction and processing, without a thorough characterization being in place. This paper studies the topology of such graphs. We an- alyze multiple Whole-system-Provenance graphs and present that they have hubs-and-authorities model of graphs as well as a power law distri- bution. Our observations allow for a novel understanding of the structure of Whole-system-Provenance graphs.DARP

Crossref

Apollo (Cambridge)

Into the depths of C: Elaborating the de facto standards

Author: Chisnall D
Lingard J
Matthiesen J
Memarian K
Nienhuis K
Sewell P
Watson RNM
Publication venue: ACM SIGPLAN Notices
Publication date: 01/01/2016
Field of study

C remains central to our computing infrastructure. It is notionally defined by ISO standards, but in reality the properties of C assumed by systems code and those implemented by compilers have diverged, both from the ISO standards and from each other, and none of these are clearly understood. We make two contributions to help improve this error-prone situation. First, we describe an in-depth analysis of the design space for the semantics of pointers and memory in C as it is used in practice. We articulate many specific questions, build a suite of semantic test cases, gather experimental data from multiple implementations, and survey what C experts believe about the de facto standards. We identify questions where there is a consensus (either following ISO or differing) and where there are conflicts. We apply all this to an experimental C implemented above capability hardware. Second, we describe a formal model, Cerberus, for large parts of C. Cerberus is parameterised on its memory model; it is linkable either with a candidate de facto memory object model, under construction, or with an operational C11 concurrency model; it is defined by elaboration to a much simpler Core language for accessibility, and it is executable as a test oracle on small examples. This should provide a solid basis for discussion of what mainstream C is now: what programmers and analysis tools can assume and what compilers aim to implement. Ultimately we hope it will be a step towards clear, consistent, and accepted semantics for the various use-cases of C.We acknowledge funding from EPSRC grants EP/H005633 (Leadership Fellowship, Sewell) and EP/K008528 (REMS Programme Grant), and a Gates Cambridge Scholarship (Nienhuis). This work is also part of the CTSRD projects sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), under contract FA8750-10-C-0237.This is the author accepted manuscript. The final version is available from the Association for Computing Machinery via http://dx.doi.org/10.1145/2908080.290808

Crossref

Apollo (Cambridge)

Recommended from our members

Queues don't matter when you can JUMP them!

Author: Crowcroft J
Gog I
Grosvenor MP
Hand S
Moore AW
Schwarzkopf M
Watson RNM
Publication venue: Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2015
Publication date: 01/01/2015
Field of study

QJUMP is a simple and immediately deployable approach to controlling network interference in datacenter networks. Network interference occurs when congestion from throughput-intensive applications causes queueing that delays traffic from latency-sensitive applications. To mitigate network interference, QJUMP applies Internet QoS-inspired techniques to datacenter applications. Each application is assigned to a latency sensitivity level (or class). Packets from higher levels are rate-limited in the end host, but once allowed into the network can “jump-the-queue” over packets from lower levels. In settings with known node counts and link speeds, QJUMP can support service levels ranging from strictly bounded latency (but with low rate) through to line-rate throughput (but with high latency variance). We have implemented QJUMP as a Linux Traffic Control module. We show that QJUMP achieves bounded latency and reduces in-network interference by up to 300×, outperforming Ethernet Flow Control (802.3x), ECN (WRED) and DCTCP. We also show that QJUMP improves average flow completion times, performing close to or better than DCTCP and pFabric.This work was supported by a Google Fellowship, EPSRC INTERNET Project EP/H040536/1, Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory (AFRL), under contract FA8750-11-C-0249.This is the final published version. It first appeared at https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/grosvenor

Apollo (Cambridge)

Disk vertical bar Crypt vertical bar Net: rethinking the stack for high-performance video streaming

Author: Handley M
Marinos I
Stewart RR
Watson RNM
Publication venue: Conference of the ACM-Special-Interest-Group-on-Data-Communication (SIGCOMM)
Publication date: 01/01/2017
Field of study

Conventional operating systems used for video streaming employ an in-memory disk buffer cache to mask the high latency and low throughput of disks. However, data from Netflix servers show that this cache has a low hit rate, so does little to improve throughput. Latency is not the problem it once was either, due to PCIe-attached flash storage. With memory bandwidth increasingly becoming a bottleneck for video servers, especially when end-to-end encryption is considered, we revisit the interaction between storage and networking for video streaming servers in pursuit of higher performance. We show how to build high-performance userspace network services that saturate existing hardware while serving data directly from disks, with no need for a traditional disk buffer cache. Employing netmap, and developing a new diskmap service, which provides safe high-performance userspace direct I/O access to NVMe devices, we amortize system overheads by utilizing efficient batching of outstanding I/O requests, process-to-completion, and zerocopy operation. We demonstrate how a buffer-cache-free design is not only practical, but required in order to achieve efficient use of memory bandwidth on contemporary microarchitectures. Minimizing latency between DMA and CPU access by integrating storage and TCP control loops allows many operations to access only the last-level cache rather than bottle-necking on memory bandwidth. We illustrate the power of this design by building Atlas, a video streaming web server that outperforms state-of-the-art configurations, and achieves ~72Gbps of plaintext or encrypted network traffic using a fraction of the available CPU cores on commodity hardware

UCL Discovery

Recommended from our members

Firmament: Fast, Centralized Cluster Scheduling at Scale

Author: Gleave A
Gog IC
Hand S
Schwarzkopf M
Watson RNM
Publication venue: Symposium on Operating Systems Design and Implementation
Publication date: 04/11/2016
Field of study

Centralized datacenter schedulers can make high-quality placement decisions when scheduling tasks in a cluster. Today, however, high-quality placements come at the cost of high latency at scale, which degrades response time for interactive tasks and reduces cluster utilization. This paper describes Firmament, a centralized scheduler that scales to over ten thousand machines at sub- second placement latency even though it continuously reschedules all tasks via a min-cost max-flow (MCMF) optimization. Firmament achieves low latency by using multiple MCMF algorithms, by solving the problem incrementally, and via problem-specific optimizations. Experiments with a Google workload trace from a 12,500-machine cluster show that Firmament improves placement latency by 20 x over Quincy [22], a prior centralized scheduler using the same MCMF optimiza- tion. Moreover, even though Firmament is centralized, it matches the placement latency of distributed schedulers for workloads of short tasks. Finally, Firmament exceeds the placement quality of four widely-used central- ized and distributed schedulers on a real-world cluster, and hence improves batch task response time by 6 x.This work was supported by a Google European Doc- toral Fellowship, by NSF award CNS-1413920, and by the Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory (AFRL), under contract FA8750-11-C-0249

Apollo (Cambridge)

CHERI: a research platform deconflating hardware virtualisation and protection

Author: Anderson J
Anderson R
Dave N
Laurie B
Moore SW
Murdoch SJ
Neumann PG
Paeps P
Roe M
Saidi H
Watson RNM
Woodruff J
Publication venue: Runtime Environments, Systems, Layering and Virtualized Environments (RESoLVE)
Publication date: 01/03/2012
Field of study

Contemporary CPU architectures conflate virtualization and protection, imposing virtualization-related performance, programmability, and debuggability penalties on software requiring finegrained protection. First observed in micro-kernel research, these problems are increasingly apparent in recent attempts to mitigate software vulnerabilities through application compartmentalisation. Capability Hardware Enhanced RISC Instructions (CHERI) extend RISC ISAs to support greater software compartmentalisation. CHERI’s hybrid capability model provides fine-grained compartmentalisation within address spaces while maintaining software backward compatibility, which will allow the incremental deployment of fine-grained compartmentalisation in both our most trusted and least trustworthy C-language software stacks. We have implemented a 64-bit MIPS research soft core, BERI, as well as a capability coprocessor, and begun adapting commodity software packages (FreeBSD and Chromium) to execute on the platform

UCL Discovery

Recommended from our members

CHERIvoke: Characterising pointer revocation using CHERI capabilities for temporal memory safety

Author: Ainsworth S
Filardo NW
Jones TM
Moore SW
Neumann PG
Richardson A
Roe M
Rugg P
Watson RNM
Woodruff J
Xia H
Publication venue: Proceedings of the Annual International Symposium on Microarchitecture, MICRO
Publication date: 01/01/2019
Field of study

A lack of temporal safety in low-level languages has led to an epidemic of use-after-free exploits. These have surpassed in number and severity even the infamous buffer-overflow exploits violating spatial safety. Capability addressing can directly enforce spatial safety for the C language by enforcing bounds on pointers and by rendering pointers unforgeable. Nevertheless, an efficient solution for strong temporal memory safety remains elusive. CHERI is an architectural extension to provide hardware capability addressing that is seeing significant commercial and open- source interest. We show that CHERI capabilities can be used as a foundation to enable low-cost heap temporal safety by facilitating out-of-date pointer revocation, as capabilities enable precise and efficient identification and invalidation of pointers, even when using unsafe languages such as C. We develop CHERIvoke, a technique for deterministic and fast sweeping revocation to enforce temporal safety on CHERI systems. CHERIvoke quarantines freed data before periodically using a small shadow map to revoke all dangling pointers in a single sweep of memory, and provides a tunable trade-off between performance and heap growth. We evaluate the performance of such a system using high-performance x86 processors, and further analytically examine its primary overheads. When configured with a heap-size overhead of 25%, we find that CHERIvoke achieves an average execution-time overhead of under 5%, far below the overheads associated with traditional garbage collection, revocation, or page-table systems.EP/K026399/1, EP/P020011/1, EP/K008528/

Apollo (Cambridge)